Genetic Variability of Splicing Sites

نویسنده

  • Dmitri Parkhomchuk
چکیده

Splicing sites provide unique statistics in human genome due to their large number and reasonably complete annotation. Analyses of the cumulative SNPs distribution in splicing sites reveal a few interesting observations. While a degree of the nucleotide conservation reflects on the SNPs density monotonically, no detectable changes in the SNPs frequencies spectrum were found. Semi-conserved nucleotide sites harbor transition mutations predominantly. We propose that such transition preference is caused by co-evolution of a site with corresponding binding agents. Since transitions in humans and similarly in other organisms are almost twice as frequent as transversions, this adaptation significantly lowers the mutation load. Results The sequences for approximately 330,000 splicing sites, which are annotated in NCBI build 35 human genome were extracted, along with all variable SNPs at these sites available at HapMap SNPs database for CEU population [1]. These HapMap SNPs were genotyped reasonably homogeneously at splicing sites. It was observed the excessive density of genotyping at some highly conserved nucleotides (GT and AG splicing sites) and exons, probably reflecting the hunt for functional variants. However this bias does not influence the main observations. In line with [2] the nucleotide sites variability and functional load were defined as illustrated on Fig. 1. The corresponding sequence logos are shown on Fig. 2. Nucleotide sites have broad distribution of functional load and, as it could be expected, the number of SNPs per site is proportional to the site variability. Exons have apparently lower SNPs density than introns, and for donor exon one can observe the traces of the increase of 3 codon position SNPs number because the large part of exons is in phase 0, i.e. 3 position is the last coding nucleotide before donor site. However no dependence of SNPs frequency was detected – the frequency distributions for neutral sites SNPs and SNPs at conserved sites are indistinguishable. It could be expected that conserved sites have more rare SNPs because purifying selection prohibits deleterious SNPs to rise in frequency. Although the statistics is rather large – hundreds of SNPs per nucleotide site, it was not possible to observe any differences. However HapMap sample size (60 unrelated individuals and 30 their children) may be insufficient to detect differences for rare SNPs. Inspecting consensuses (Fig. 2) it is evident that the majority of semi-conserved sites have the next best-fit base as a transition mutation from the top base. The probability for a random pair of bases to be related by transition is 1/3 (Fig. 4), thus presumably this far from random pattern reflects optimisation for mutation load. Transitions are nearly twice as frequent as transversions in humans, thus when two best nucleotides for a given site are related by transition, a random mutation is more likely to be “synonymous” not detrimental for site functioning. Fig. 4 demonstrates the apparent dependence of transversions to transitions ratio versus variability for the acceptor tail. This mechanism may work only for semi-conserved nucleotide site with functional load < 1 bit. At higher loads two equally good bases are impossible for an obvious reason 2 equally probable states give the entropy of 1 bit, thus for highly conserved sites there is indeed no significant preference for transitions (data not shown). Conclusion It is likely that most of non-coding functionality is not yet characterized and the amount of it in large genomes may be larger than coding part [4]. Arguably, the deciphering of non-coding functionality is the next large-scale hardest problem in genomics. It seems that due to the generality of information theory, described observations could be usefully extended on non-coding sequences en mass [4]. Splicing sites, due to large statistics, may serve as a calibration reference for relative SNPs density versus functional load. For example, with functional load of 1 bit the SNPs density falls slightly more than twice, in comparison with neutral sequence. (Fig. 3) Apparently analogous decrease happens in orthologous sequences evolution. Adaptation of semi-functional sites (or better to say of their binding agents) for the prevalent transition mutations is analogous to the replacement-to-synonymous mutations (R/S) metric and can be equally useful in evolutionary analyses of noncoding sequences. This kind of adaptation to mutational bias appears to be quite ubiquitous as it affects the genetic code itself [3], where it is quite transparent for the 3 codon positions – nearly all transitions are strictly synonymous in contrary to transversions. References: 1. Altshuler, D., Brooks, L.D., Chakravarti, A., Collins, F.S., Daly, M.J., Donnelly, P. A haplotype map of the human genome. International HapMap Consortium. Nature 437(7063),1299-320 (2005). 2. Schneider TD. Evolution of biological information. Nucleic Acids Res. 2000 Jul 15;28(14):2794-9. 3. Freeland SJ, Hurst LD.J The genetic code is one in a million. Mol Evol. 1998 Sep;47(3):238-48. 4. Parkhomchuk DV, Di-nucleotide Entropy as a Measure of Genomic Sequence Functionality. http://xxx.lanl.gov/abs/q-bio.GN/0611059 donor acceptor ~ 3 3 0 , 0 0 0 a n n o t a t e d s i t e s i n h u m a n g e n o m e { } T G C A f f f f , , , Base frequencies for a given nucleotide site ( ) ∑ = − = T G C A i i i f f H , , , 2 log Site variability: H (0-2 bits) Site conservation (or functional load): 2-H „Sequence logo“ – total column height is proportional to conservation, letter heights to frequencies. Fig. 1. Schema of data processing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of a Novel Splice Site Mutation in RUNX2 Gene in a Family with Rare Autosomal Dominant Cleidocranial Dysplasia

Introduction: Pathogenic variants of RUNX2, a gene that encodes an osteoblast-specific transcription factor, have been shown as the cause of CCD, which is a rare hereditary skeletal and dental disorder with dominant mode of inheritance and a broad range of clinical variability. Due to the relative lack of clinical complications resulting in CCD, the medical diagnosis of this disorder is challen...

متن کامل

Sodium Butyrate and Valproic Acid as Splicing Restoring Agents in Erythroid Cells of b-Thalassemic Patients

Background: b-Thalassemia is a common autosomal recessive disorder in human caused by a defect in b-globin chain synthesis. The most common mutations causing b-Thalassemia have been found to be splicing mutations. Most of which activate aberrant cryptic splicing/sites without complete disruption of normal splicing. IVSI-110 mutation, a common splicing mutation, leads to a 90% reduction of norma...

متن کامل

The genetic basis for individual differences in mRNA splicing and APOBEC1 editing activity in murine macrophages.

Alternative splicing and mRNA editing are known to contribute to transcriptome diversity. Although alternative splicing is pervasive and contributes to a variety of pathologies, including cancer, the genetic context for individual differences in isoform usage is still evolving. Similarly, although mRNA editing is ubiquitous and associated with important biological processes such as intracellula...

متن کامل

A Nested-Splicing by Overlap Extension PCR Improves Specificity of this Standard Method

Background: Splicing by overlap extension (SOE) PCR is used to create mutation in the coding sequence of an enzyme in order to study the role of specific residues in protein’s structure and function. Objectives: We introduced a nested-SOE-PCR (N –SOE-PCR) in order to increase the specificity and generating mutations in a gene by SOE-PCR.   Materials and Methods: Genomic DNA from Bacillus thermo...

متن کامل

H2A.Z Nucleosome Positioning Has No Impact on Genetic Variation in Drosophila Genome

Nucleosome occupancy results in complex sequence variation rate heterogeneity by either increasing mutation rate or inhibiting DNA repair in yeast, fish, and human. H2A.Z nucleosome is extensively involved in gene transcription activation and regulation. To test whether H2A.Z nucleosome has the similar impact on sequence variability in the Drosophila genome, we profiled the H2A.Z nucleosome occ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006